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Abstract 

We propose a novel Riemannian preconditioning approach for the tensor com¬ 
pletion problem with rank constraint. A Riemannian metric or inner product is 
proposed that exploits the least-squares structure of the cost function and takes 
into account the structured symmetry in Tucker decomposition. The specific met¬ 
ric allows to use the versatile framework of Riemannian optimization on quotient 
manifolds to develop a preconditioned nonlinear conjugate gradient algorithm for 
the problem. To this end, concrete matrix representations of various optimization- 
related ingredients are listed. Numerical comparisons suggest that our proposed 
algorithm robustly outperforms state-of-the-art algorithms across different prob¬ 
lem instances encompassing various synthetic and real-world dataset^] 


1 Introduction 


This paper addresses the problem of low-rank tensor completion when the rank is a priori known 
or estimated. Without loss of generality, we focus on 3-order tensors. Given a tensor T" lXn2Xn3 , 
whose entries AT* l2 Vj are only known for some indices * 3 ) G U, where U is a subset of 

the complete set of indices {(* 1 ,^ 2 , * 3 ) : id G {1 ,... ,rid},d G {1,2,3}}, the fixed-rank tensor 
completion problem is formulated as 


min W\W V n{X)-Vn{X*)f F 

^El"l > <" 2 x»3 |S2| 

subject to rank(A') = r, 


(1) 


where the operator Vn(X) ili2i3 = X ili2i3 if (ii, * 2 , * 3 ) G fi and Vn{X) ili2i3 = 0 otherwise and 
(with a slight abuse of notation) || • || jr is the Frobenius norm. rank(AT) (= r = (ri, 7 - 2 , 7 ^ 3 )), called 
the multilinear rank of X , is the set of the ranks of for each of mode-d unfolding matrices, r,/ <C rid 
enforces a low-rank structure. The mode is a matrix obtained by concatenating the mode-d fibers 
along columns, and mode-d unfolding of X is £ jjndxnd+i—nom—n<j-i f or d = {1,..., D}. 

Problem Q has many variants, and one of those is extending the nuclear norm regularization ap¬ 
proach from the matrix case El to the tensor case. This results in a summation of nuclear norm 
regularization terms, each one corresponds to each of the unfolding matrices of X. While this 
generalization leads to good results mm, its applicability to large-scale instances is not trivial, 
especially due to the necessity of high-dimensional singular value decomposition computations. A 
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different approach exploits Tucker decomposition 0 Section 4] of a low-rank tensor X to develop 
large-scale algorithms for ([lj, e.g., in 17] [ 8 l. 

The present paper exploits both the symmetry present in Tucker decomposition and the least-squares 
structure of the cost function of (lj to develop a competitive algorithm. To this end, we use the 
concept of preconditioning. Whi e preconditioning in unconstrained optimization is well studied 
0 Chapter 5], preconditioning on constraints with symmetries, owing to non-uniqueness of Tucker 
decomposition J 6 ] Section 4.3], is not straightforward. We build upon the recent work flOl that 
suggests to use Riemannian preconditioning with a tailored metric (inner product) in the Riemannian 
optimization framework on quotient manifolds mmm. Use of Riemannian preconditioning for 
the low-rank matrix completion problem is discussed in m, where a preconditioned nonlinear 
conjugate gradient algorithm is proposed. It connects to state-of-the-art algorithms in [15 , 16] and 
shows competitive performance. In this paper, we generalize the work M to tensor completion. 

The paper is organized as follows. Section[2]discusses the two fundamental structures of symmetry 
and least-squares associated with 0 and proposes a novel metric that captures the relevant second- 
order information of the problem. The optimization-related ingredients on the Tucker manifold are 
developed in Section [3] The cost function specific ingredients are developed in Section [4] The final 
formulas are listed in Table[I] In Section[5] numerical comparisons with state-of-the-art algorithms 
on various synthetic (both small and large-scale instances) and real-world benchmarks suggest a 
superior performance of our proposed algorithm. Our proposed preconditioned nonlinear conjugate 
gradient algorithm is implemented in the Matlab toolbox Manopt E). 

The concrete developments of optimization-related ingredients and additional numerical experi¬ 
ments are shown in Sections [A] and [B] respectively, of the supplementary material section. 


2 Exploiting the problem structure 

Construction of efficient algorithms depends on properly exploiting both the structure of constraints 
and cost function. To this end, we focus on two fundamental structures in {T}: symmetry in the 
constraints, and the least-squares structure of the cost function. Finally, a novel metric is proposed. 

The quotient structure of Tucker decomposition. The Tucker decomposition of a tensor X £ 

jnixn 2 xn 3 0 f r ( = ( ri) y 2 , r3 )) i s Jg] Section 4.1] 

X = ( 5 X 1 U 1 X 2 U 2 X 3 U 3 , (2) 

where U<* £ St (r d ,n d ) for d £ {1,2,3} belongs to the Stiefel manifold of matrices of size rid x 
r d with orthogonal columns and Q £ l riXr2 X r 3, Here, VV x d V £ R"ix---n d _ixmxr ! „ J+ iX'"njv 
computes the d-mode product of a tensor W G R n ix---xnjv anc | a ma trix V £ 

Tucker decomposition |2]) is not unique as X remains unchanged under the transformation 

(U-i, U 2 , U 3 , G) ^ (UrOr, U 2 0 2 , U 3 O 3 , Qx 1 Of x x 3 Oj) (3) 

for all Od £ 0(r d ), the set of orthogonal matrices of size of r d x r d . The classical remedy to 

remove this indeterminacy is to have additional structures on Q like sparsity or restricted orthogonal 
rotations 0 Section 4.3]. In contrast, we encode the transformation ([3| in an abstract search space 
of equivalence classes, defined as, 

[(Ui,U 2 ,U 3 ,S)] :={(U 1 0 i,U 2 0 2 ,U 303 ,gx 1 0 fx 20 jx 3 0 j): 0 d e 0 (r d )}. (4) 

The set of equivalence classes is the quotient manifold iTTS] Theorem 9.16] 

M/~ := M/(0(n)xO(q)xO(4 (5) 

where A4 is called the total space (computational space) that is the product space 

M := St(ri,ni) x St(r 2 ,n 2 ) x St(r 3 ,n 3 ) x Rrixrsxra. ( 6 ) 

Due to the invariance (|3j, the local minima of ({T} in A4 are not isolated, but they become isolated on 
A4/~. Consequently, the problem 0 is an optimization problem on a quotient manifold for which 

2 The Matlab code is available at http: //bamdevmishra . com/codes/tensorcompletion/ 
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systematic procedures are proposed in lUTl [T2l [T31 by endowing ,M/~ with a Riemannian structure. 
We call M ./ ~, defined in |5]), the Tucker manifold as it results from Tucker decomposition. 

The least-squares structure of the cost function. In unconstrained optimization, the Newton 
method is interpreted as a scaled steepest descent method, where the search space is endowed 
with a metric (inner product) induced by the Hessian of the cost function J9[. This induced metric 
(or its approximation) resolves convergence issues of first-order optimization algorithms. Analo¬ 
gously, finding a good inner product for |T|i is of profound consequence. Specifically for the case of 
quadratic optimization with rank constraint (matrix case), Mishra and Sepulchre flOl Section 5] pro¬ 
pose a family of Riemannian metrics from the Hessian of the cost function. Applying this approach 
directly for the particular cost function of 0 is computationally costly. To circumvent the issue, 
we consider a simplified cost function by assuming that Cl contains the full set of indices, i.e., we 
focus on |j X — to propose a metric candidate. Applying the metric tuning approach of QQJ 

Section 5] to the simplified cost function leads to a family of Riemannian metrics. A good trade-off 
between computational cost and simplicity is by considering only the block diagonal elements of 
the Hessian of \\X — AT*|||,. It should be noted that the cost function \\X — is convex and 

quadratic in X. Consequently, it is also convex and quadratic in the arguments (Ui, U 2 , U 3 , <5) 
individually. Equivalently, the block diagonal approximation of the Hessian of \\X — X*\\' 2 F in 
(Ui,U 2 ,U 3 ,S)is 

((GiGf ) 0 I ni , (G 2 G^) ® I„ 2 , (G 3 GJ) 0 I n3 , l rir2r3 ), (7) 

where G,/ is the mode-d unfolding of Q and is assumed to be full rank. The terms G^Gj for 
d £ {1,2,3} are positive definite when n < r 2 r 3 , r 2 < r-pr^, and < rir 2 , which is a reasonable 
modeling assumption. 

A novel Riemannian metric. An element x in the total space M has the matrix representation 
(Ui, U 2 , U 3 , Q). Consequently, the tangent space T X M. is the Cartesian product of the tangent 
spaces of the individual manifolds of 0 . i e., T X M. has the matrix characterization lU3l 

T X M = { , Z Uo , Z Us , Zg ) e K niXri X X K ra 3 Xr 3 x ri xr 2 xr 3 . 

UjZ Ud + Zu d Ud = 0 , for d £ { 1 , 2 ,3}}. (8) 

From the earlier discussion on symmetry and least-squares structure, we propose the novel metric 

x T X M —> K 

gx{ix,r]x) = (^Ui,? 7 Ui(GiGf)) + (Cu 2 > 7 ?U 2 {G 2 G^)) + (Cu 3 > 7 7u 3 ( G 3 GJ)) + {£g,r)g), (9) 

where f; x ,r]x £ T X A4 are tangent vectors with matrix characterizations, shown in ©■ 
(Cui, £u 2 , Cu 3 , ig ) and (ryui I t?u 2 I ? 7 u 3 : r yg ) 5 respectively and (•,•) is the Euclidean inner product. 
It should be emphasized that the proposed metric (|9]) is induced from (J7J. 

3 Notions of optimization on the Tucker manifold 


v x 



Figure 1: Riemannian optimization framework: geometric objects, shown in dotted lines, on the 
quotient manifold Ad/~ call for matrix representatives, shown in solid lines, in the total space A4. 

Each point on a quotient manifold represents an entire equivalence class of matrices in the total 
space. Abstract geometric objects on a quotient manifold call for matrix representatives in the total 
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space. Similarly, algorithms are run in the total space A4, but under appropriate compatibility be¬ 
tween the Riemannian structure of M. and the Riemannian structure of the quotient manifold A4 / ~, 
they define algorithms on the quotient manifold. The key is endowing M./ ~ with a Riemannian 
structure. Once this is the case, a constraint optimization problem, for example G- is conceptually 
transformed into an unconstrained optimization over the Riemannian quotient manifold ([5]). Below 
we briefly show the development of various geometric objects that are required to optimize a smooth 
cost function on the quotient manifold 0 with first-order methods, e.g., conjugate gradients. 

Quotient manifold representation and horizontal lifts. Figure [T] illustrates a schematic view of 
optimization with equivalence classes, where the points x and y in A4 belong to the same equiva¬ 
lence class (shown in solid blue color) and they represent a single point [x] := {y £ A4 : y ~ a;} on 
the quotient manifold AAj The abstract tangent space TLi (A4 / ~) at [as] £ A4 / ~ has the matrix 
representation in T X M, but restricted to the directions that do not induce a displacement along the 
equivalence class [a;]. This is realized by decomposing T X M into two complementary subspaces, 
the vertical and horizontal subspaces. The vertical space V x is the tangent space of the equivalence 
class [x]. On the other hand, the horizontal space 'H , is the orthogonal subspace to V x in the sense 
of the metric (j9]l. Equivalently, T X A4 = V x CD 'H x . The horizontal subspace provides a valid matrix 
representation to the abstract tangent space TLijyVf/ ~) ifTTl Section 3.5.8], An abstract tangent 
vector £[.,.] £ Tj x ] (A4 / ~) at [x\ has a unique element £ x £ T~L X that is called its horizontal lift. 

A Riemannian metric g x : T X A4 x T X M. —> R at x £ AA defines a Riemannian metric 
9[x] ■ T[x](M/ ~) x T [x ](M /~) ->• K, i.e., g[ x ]{^[ x ],V[x]) := 9x{^x,Vx) on the quotient mani¬ 
fold A4/ ~, if (j x (f x , ij x ) does not depend on a specific representation along the equivalence class 
[4 Here, <£ x i and n x ] are tangent vectors in T\ x i(A4/ ~), and and rj x are their horizontal 
lifts in 7 -L x at x, respectively. Equivalently, the definition of the Riemannian metric is well posed 
when g x {f x , ( x ) = 9x{fy, ( y ) for all x,y £ [x], where £ x ,( x £ T~L X and £ y X v £ Tiy are the 
horizontal lifts of £ T^fAi /~) along the same equivalence class [x]. From [11, Propo¬ 

sition 3.6.1], it suffices to show that the metric (|9]) for tangent vectors , Q x £ T X J\A does not 
change under the transformations (Ui, U 2 , U 3 , Q) 1 —> (U 1 0 1 ,U 2 0 2 ,U 3 0 3 ,Sx 1 0?’ x 2 0^ X 3 OJ), 
(CunCusi^Usi^e) ^ KuiOi.^Oa.^Oa.^XiOj’xaOjxaOl’), and ,Cu 2 > Cu 3 1 Cg) ^ 
(Cui0 1 , Cu 2 O 2 , Cu 3 0 3 ,CgXiOf x 2 0 2 x 3 0 2 ). A few straightforward computations show that this 
is indeed the case. Endowed with the Riemannian metric (|9b , the quotient manifold Ai / ~ is a Rie¬ 
mannian submersion of A 4. The submersion principle allows to work out concrete matrix represen¬ 
tations of abstract object on Ad/~, e.g., the gradient of a smooth cost function [TT] Section 3.62]. 

Starting from an arbitrary matrix (with appropriate dimensions), two linear projections are needed: 
the first projection 4> x is onto the tangent space T X A4, while the second projection I l x is onto the 
horizontal subspace 7f x . The computation cost of these projections is 0(nirf + n 2 r 2 + n 3 r|). 

The tangent space T X A4 projection operation is obtained by extracting the component nor¬ 
mal to T X AA in the ambient space. The normal space N x Jvi has the matrix characterization 
{(U 1 S Ol (G 1 G?’)- 1 ,U 2 S U2 (G 2 Gi’)- 1 ) U 3 S U 3(G 3 G^)- 1 ,0) : S Ud =S Vd , for d £ 

{1,2,3}}. Symmetric matrices Su d for all d £ {1,2,3} parameterize the normal space. Finally, 
the operator : R n ,x ri x R n 2 xr 2 x K » 3 xr 3 x JRnXraXra TxM . (Yjj,, Yjj 2 , Yu 3 , Yg) 
1 —!> x (Yjj ,. Yi; 2 , Y (i :j , Yg) has the matrix characterization 

’MY Ul ,Y U 2 ,Y lJ 3,Yg) = (Y Dl -U 1 S Dl (G 1 G?’)- 1 ,Y U 2 -U 2 S D 2 (G 2 Gi’)- 1 , 

Y U 3 -U 3 S U 3 (G 3 G^)- 1 ,Yg), 1 J 

where Su d is the solution to the Lyapunov equation Si;,,G,/G (/ + G,/G,, S(i ri = G,/G (/ (Y Ud Ud + 
UjYjjrfjGdGj for d £ {1,2,3}, which are solved efficiently with the Matlab’s lyap routine. 

The horizontal space projection operator of a tangent vector is obtained by removing the compo¬ 
nent along the vertical space. In particular, the vertical space V x has the matrix characterization 

{(Uini,U 2 n 2 ,U 3 n 3 , — (GxiLli + Qx 2 Ll 2 + Qx 3 Ll 3 )) : Lid £ M rdXrd , FiJ = —Lid for d £ 
{l, 2, 3}}. Skew symmetric matrices Lid for all d £ {1,2, 3} parameterize the vertical space. Fi¬ 
nally, the horizontal projection operator 11, : T X A4 > 'H x : rj x 1[,, (r ] x ) has the expression 

n x {r)x) = (? 7 ui — Uilli,pu 2 — V 2 Ll 2 , VV3 - V 3 Ll 3 ,r l g-(-(gx 1 Ll 1 +gx 2 Ll 2 +gx 3 Ll 3 ))), 
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where rj x = (r]v 1 , r]\j 2 , ?/u 3 ,r]g) £ T x Ai and fid is a skew-symmetric matrix of size x tr that is 
the solution to the coupled Lyapunov equations 

' GiGf fli + «iGiGf - Gi(I r3 0 fl 2 )Gf - Gr(n 3 0 I r JGf 

= Skew(UfpuiGiG^) + Skew^ip/jJ, 

G 2 G^ft 2 + r2 2 G 2 G^ - G 2 (I r3 0 Oi)Gj - G 2 (n 3 0 IrJGj 
' =Skew(U^ U2 G 2 G^) + Skew(G 2 ^J, 1 J 

G 3 G lfl 3 + 0 3 G 3 Gj - G 3 (I r2 0 flJGl - G 3 (f2 2 0 I ri )Gg 

= Skew(U^r 7 u 3 G 3 G 3 ) + Skew(G 3 ??£ 3 ), 

where Skew(-) extracts the skew-symmetric part of a square matrix, i.e., Skew(D) = (D D r )/2. 
The coupled Lyapunov equations HD are solved efficiently with the Matlab’s peg routine that is 
combined with a specific preconditioner resulting from the Gauss-Seidel approximation of m 

Retraction. A retraction is a mapping that maps vectors in the horizontal space to points on the 
search space M and satisfies the local rigidity condition ifTTl Definition 4.1], It provides a natural 
way to move on the manifold along a search direction. Because the total space M has the product 
nature, we can choose a retraction by combining retractions on the individual manifolds, i.e., 

Rx{£,x) = (uf(Ui +£ui),uf(U 2 + £u 2 ),uf(U 3 + £u 3 ) 5 <5 + £p), 
where £ R x and uf(-) extracts the orthogonal factor of a full column rank matrix, i.e., 
uf(A) = A(A J A) -1 / 2 . The retraction R x defines a retraction -R[ x ](£[ x ]) := [-R x (£ x )] on the 
quotient manifold Ad/ ~, as the equivalence class [f? x (£ x )] does not depend on specific ma¬ 
trix representations of [x] and £r x i, where f x is the horizontal lift of the abstract tangent vector 
£[x] € T[ X ](M/ ~). 

Vector transport. A vector transport T V: f x on a manifold Ad is a smooth mapping that transports a 
tangent vector £ x £ T x Ad at x £ Ad to a vector in the tangent space at R x (r] x ) fTTl . Section 8.1.4], 
It generalizes the classical concept of translation of vectors in the Euclidean space to manifolds. 
The horizontal lift of the abstract vector transport T V[x] £[ x ] on Ad / ~ has the matrix characterization 
n n*(rix)(£x))’ where and dx are the horizontal lifts in H x of f [x] 
and ? 7 [ x ] that belong to Tj x j (Ad/ ~). The computational cost of transporting a vector solely depends 
on the projection and retraction operations. 

4 Preconditioned conjugate gradient algorithm for ([!]) 

We propose a Riemannian nonlinear conjugate gradient algorithm for the tensor completion problem 
([T} that is based on the developments in Section [3] The preconditioning effect follows from the 
specific choice of the metric 0- The earlier developments allow to use the off-the-shelf conjugate 
gradient implementation of Manopt for any smooth cost function ED- A complete description of the 
Riemannian nonlinear conjugate gradient method is in DU Chapter 8]. The convergence analysis of 
the Riemannian conjugate gradient method follows from (19, 201. The only remaining ingredients 
are the cost function specific ingredients. To this end, we show the computation of the Riemannian 
gradient as well as a way to compute an initial guess for the step-size, which is used in the conjugate 
gradient method. The concrete formulas are shown in Table [I] The total computational cost per 
iteration of our proposed algorithm is 0(|fl|rir 2 r 3 ), where |fljis the number of known entries. 

Riemannian gradient computation. Let f(X) = \\Pn(X) - 'Pn(Af*)|||/|f2| be the mean 
square error function of (|TJ». and S = 2('Pn(C/XiUi x 2 U 2 X 3 U 3 ) — T > n(X*))/\fl\ be an auxil¬ 
iary sparse tensor variable that is interpreted as the Euclidean gradient of / in R 7) | X " 5X " 3 . The 
partial derivatives of the function / with respect to (Ui,U 2 ,U 3 ,<?) are computed in terms of the 
unfolding matrices S r j. Due to the specific scaled metric the partial derivatives are further 
scaled by ((GiGf ) _1 , (G 2 G 2 ") _1 , (G 3 G 3 ) _1 ,Z), denoted as egrad x / (after scaling). Finally, 
from the Riemannian submersion theory tm Section 3.6.2], the horizontal lift of gradj,,.] / is equal 
to grad x / = 'P(egrad x f). Subsequently, 

the horizontal lift of grad[ x ]/ = (Si(U 3 0 U 2 )Gf (G 3 Gf ) _1 — UiBi^ (G 3 Gf ) _1 , 

S 2 (U 3 0 UOG^G.G^)- 1 - U 2 B U2 (G 2 G^)- 1 , 

S 3 (U 2 0U 1 )Gj(G 3 Gj)- 1 - U 3 Bu 3 (G 3 Gg ) _1 , 

SXiUf x 2 I# x 3 U 3 ), 
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Table 1: Ingredients to implement an off-the-shelf conjugate gradient algorithm for Q- 


Matrix representation 

x ■= (Ui. i: 2 . U 3 , Q) 

Computational space Ai 

St(ri,ni) x St(r 2 ,n 2 ) x St(r 3 ,n 3 ) x R r 'ixr- 2 xr 3 

Group action 

{(UrOr, U 2 O 2 , U 3 O 3 , Q x iOfx-.-Of X3O3 j : 

O d £ 0(r d ),forde {1,2,3}} 

Quotient space Ai /~ 

St(ri,ni) x St(r 2 . rij) x St(r 3 ,n 3 ) x 3; n xxr 3 
/(0(rr) x 0(r 2 ) x 0(r 3 )) 

Ambient space 

R" lXn XI" 2Xr! XI" 3Xr3 X 5 , n XX r 3 

Tangent vectors in 

T X M 

{ (ZxJi , Zu 2 , Zu 3 , Zg ) £ ]R ni X ri X r 2X7-2 xR «3Xr 3 xR r- lX r 2 Xr 3 

: U d Zu d + Zl"V d = 0, for d £ {1, 2, 3}} 

Metric g x (£ x ,r] x ) for 
any £r, r/x £ T X M 

(£ui, '/r (GiG'i)) + (£u 2 , '/t: 2 (G 2 G 2 )) + (£u 3 , nv. t (G 3 G 3 )) + (£g, r\g) 

Vertical tangent 
vectors in V x 

|(U 1 1 . U 2 S~i 2 ,U 3 G 3 , -(5xifli + Q x 2 fl 2 + g?x 3 n 3 )) : 

n d eR r * Xr *,n% » -n d , ford e { 1 , 2 , 3 }} 

Horizontal tangent 
vectors in H x 

{(CUi > CUo > Cu 3 > Cs) c T x Ai : 

(G d G d )Cu d U d + CG d Gj is symmetric, for d £ {1,2,3}} 

$(■) projects an ambient 
vector (Y Ul5 Yu 2 , Yu 3 ,Yg) 
onto T x Ai 

(Y i:; - U.VfGiG}') 1 . Y ll2 - U 2 S l:2 (G 2 Gl) 

Yu 3 — UsSu 3 (G 3 G 3 ) \ Yg), where Su d for d £ {1, 2, 3} are computed 
by solving Lyapunov equations as in {To}. 

H(-) projects a tangent 
vector £ onto Hx 

(Cui — Uifii, Gj 2 — U 2 G 2 ,£u 3 — U 3 SJ 3 , 

ig — (— (C/XiOr + C? x 2 S~2 2 + <7x 3 f2 3 ))), fid is computed in 1 11 1 . 

First-order derivative 
of f(x) 

(Si(u 3 x i: 2 )g{. s 2 (u 3 ® Ui)gT,s 3 (u 2 ® Ui)Gr), 

S XiUf x 2 u;r XaUl 1 ), 

where S = ^ {Vn{Qx 1 U 1 x 2 U 2 X 3 U 3 ) - Vn{X*)). 

Retraction R x (£ x ) 

(uf(Ui + £u,),uf(U 2 + £u 2 ),uf(U 3 + £u 3 ),£ +Ce) 

Horizontal lift of the 
vector transport 77 7 w £[ x ] 



where Bu d for d G {1.2. 3} are the solutions to the Lyapunov equations 

f + G 1 GfB Ul = 2Sym(G 1 Gfuf(S 1 (U 3 <g>U 2 )Gf), 

< B U2 G 2 G^ + G 2 G^B U2 = 2Sym(G 2 GM(S2(U 3 ®U 1 )G^), 

[ B U3 G 3 Gj + G 3 G^B U3 = 2Sym(G 3 GM(S 3 (U 2 ®U 1 )G^), 

which are solved efficiently with the Matlab’s lyap routine. Sym(-) extracts the symmetric part of a 
square matrix, i.e., Sym(D) = (D + D 7 )/2. The total numerical cost of computing the Riemannian 
gradient depends on computing the partial derivatives, which is ()(\V-\r-\ r 2 r 3 ). 

Initial guess for the step size. The least-squares structure of the cost function in ([TJ also al¬ 
lows to compute a linearized step-size guess efficiently along a search direction by considering 
a polynomial approximation of degree 2 over the manifold mim. Given a search direction 

£,x e H x , the step-size guess is argmin s6R+ ||'Pfi(<5x :L UiX 2 U 2 x 3 U 3 + sC/Xi^ x 2 U 2 x 3 U 3 + 
X 1 U 1 x 2 £u 2 x 3 U 3 + asxlUix 2 U 2 x 3 &,3 + xiUix 2 U 2 x 3 U 3 ) - Vn(X*)\\ 2 F , which has a 

closed-form expression and the numerical cost of computing it is 0(|O|r* 1 r - 2 r’ 3 ). 

5 Numerical comparisons 

We show a number of numerical comparisons of our proposed Riemannian preconditioned nonlinear 
conjugate algorithm with state-of-the-art algorithms that include TOpt 0 and geomCG (H, for 
comparisons with Tucker decomposition based algorithms, and HaLRTC 0, Latent 0, and Hard 
151 as nuclear norm minimization algorithms. All simulations are performed in Matlab on a 2.6 GHz 
Intel Core i7 machine with 16 GB RAM. For specific operations with unfoldings of <S, we use the 
mex interfaces for Matlab that are provided by the authors of geomCG. For large-scale instances, 
our algorithm is only compared with geomCG as other algorithms cannot handle these instances. 

Since the dimension of the space of a tensor £ ]g^ixrt 2 xn 3 0 f ran j, r _ (r 3 , r 2 , r 3 ) is dim(A"l /~) = 
Y^.=i( n df'd — 7d) + 7"ir 2 r 3 , we randomly and uniformly select known entries based on a multiple 
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of the dimension, called the over-sampling (OS) ratio, to create the training set O. Algorithms (and 
problem instances) are initialized randomly, as in |8), and are stopped when either the mean square 
error (MSE) on the training set 0 is below 10~ 12 or the number of iterations exceeds 250. We also 
evaluate the mean square error on a test set 1’, which is different from f2. Five runs are performed in 
each scenario and the plots show all of them. The time plots are shown with standard deviations. 



(a) Case SI: comparison between metrics. 



Tensor Size (per dimension) 


(d) Case S3. 
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(f) Case S5: CN = {5, 50, 100}. 
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(g) Case S6: noisy data. 
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(i) Case Rl: Ribeira, OS = 11. 


Figure 2: Experiments on synthetic and real datasets. 


Case SI: comparison with the Euclidean metric. We first show the benefit of the proposed metric 
(|9) over the conventional choice of the Euclidean metric that exploits the product structure of M 
and symmetry ([3]). This is defined by combining the individual natural metrics for St(r fJ r, n,i) and 
M n xr 2 xr ' 3 _ p or simulations, we randomly generate a tensor of size 200 x 200 x 200 and rank 
r = (10,10,10). OS is 10. For simplicity, we compare steepest descent algorithms with Armijo 
backtracking linesearch for both the metric choices. Figure [2| a) shows that the algorithm with the 
metric (|9]) gives a superior performance than that of the conventional metric choice. 

Case S2: small-scale instances. Small-scale tensors of size 100 x 100 x 100,150 x 150 x 150, and 
200 x 200 x 200 and rank r = (10,10,10) are considered. OS is {10, 20, 30}. Figure [2jb) shows that 
the convergence behavior of our proposed algorithm is either competitive or faster than the others. 
In Figure |2fc), the lowest test errors are obtained by our proposed algorithm and geomCG. 

Case S3: large-scale instances. We consider large-scale tensors of size 3000 x 3000 x 3000, 
5000 x 5000 x 5000, and 10000 x 10000 x 10000 and ranks r = (5,5, 5) and (10,10,10). OS is 
10. Our proposed algorithm outperforms geomCG in Figure[2jd). 

Case S4: influence of low sampling. We look into problem instances from scarcely sampled data, 
e.g., OS is 4. The test requires completing a tensor of size 10000 x 10000 x 10000 and rank r = 
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(5,5,5). Figure [2je) shows the superior performance of the proposed algorithm against geomCG. 
Whereas the test error increases for geomCG, it decreases for the proposed algorithm. 

Case S5: influence of ill-conditioning and low sampling. We consider the problem instance of 
Case S4 with OS = 5. Additionally, for generating the instance, we impose a diagonal core Q with 
exponentially decaying positive values of condition numbers (CN) 5, 50, and 100. Figure[2jf) shows 
that the proposed algorithm outperforms geomCG for all the considered CN values. 

Case S6: influence of noise. We evaluate the convergence properties of algorithms under the pres¬ 
ence of noise by adding scaled Gaussian noise Vn(£) to Vn{X*) as in J8] Section 4.2.1]. The dif¬ 
ferent noise levels are e = {10~ 4 ,10 —6 ,10 —8 ,10 —10 ,10” 12 }. In order to evaluate for e = 10” 12 , 
the stopping threshold on the MSE of the train set is lowered to 10 -24 . The tensor size and rank are 
same as in Case S4 and OS is 10. Figure |2](g) shows that the test error for each e is almost identical 
to the e 2 \\Pn(X*)\\ 2 F J8j Section 4.2.1], but our proposed algorithm converges faster than geomCG. 

Case S7: asymmetric instances. We consider instances where the dimensions and ranks along 
certain modes are different than others. Two cases are considered. Case (7.a) considers tensors size 
20000 x 7000 x 7000, 30000 x 6000 x 6000, and 40000 x 5000 x 5000 with rank r = (5,5, 5). Case 
(7.b) considers a tensor of size 10000 x 10000 x 10000 with ranks (7,6,6), (10,5, 5), and (15,4,4). 
In all the cases, the proposed algorithm converges faster than geomCG as shown in Figure [2|h). 

Case Rl: hyperspectral image. We consider the hyperspectral image “Ribeira” ll22l discussed in 
ll23l [8l . The tensor size is 1017 x 1340 x 33, where each slice corresponds to an image of a particular 
scene measured at a different wavelength. As suggested in j23l l8l. we resize it to 203 x 268 x 33. 
We compare all the algorithms, and perform five random samplings of the pixels based on the OS 
values 11 and 22, corresponding to the rank r=(15,15, 6) adopted in (8). This set is further randomly 
split into 80/10/10-train/validation/test partitions. The algorithms are stopped when the MSE on the 
validation set starts to increase. While OS = 22 corresponds to the observation ratio of 10% studied 
in J8|, OS = 11 considers a challenging scenario with the observation ratio of 5%. Figures [2ji) 
shows the good performance of our proposed algorithm. Table[2]compiles the results. 

Case R2: MovieLens-10\£| This dataset contains 10000054 ratings corresponding to 71567 users 
and 10681 movies. We split the time into 7-days wide bins results, and finally, get a tensor of size 
71567 x 10681 x 731. The fraction of known entries is less than 0.002%. The tensor completion 
task on this dataset reveals periodicity of the latent genres. We perform five random 80/10/10- 
train/validation/test partitions. The maximum iteration threshold is set to 500. As shown in Table[2] 
our proposed algorithm consistently gives lower test errors than geomCG across different ranks. 


Table 2: Cases Rl and R2: test MSE on T and time in seconds. 


Ribeira 

OS = 11 

OS = 22 

Algorithm 

Time 

MSE on T 

Time 

MSE on T 

Proposed 

33 ± 13 

8.2095 10~ 4 ± 1.7 10~ 5 

67 ± 43 

6.9516 • 10~ 4 ± 1.1 10 -5 

geomCG 

36 ± 14 

3.8342 • 10- 1 ± 4.2 • it) -2 

150 ± 48 

6.2590 • lO -3 ± 4.5 • 10 3 

HaLRTC 

46 ± 0 

2.2671 • 10- 3 ± 3.6 • i"6 -5 

48 ± 0 

1.3880 • lO -3 ± 2.7 • io 5 

TOpt 

80 ± 32 

1.7854 • 10 3 ± 3.8 ■ lO -4 

27 ± 21 

2.1259 ■ IO -3 ± 3.8 ■ 10 -4 

Latent 

553 ± 3 

2.9296 • 10 3 ± 6.4 ■ io -5 

558 ± 3 

1.6339 ■ IO -3 ± 2.3 • 10 3 

Hard 

400 ± 5 

6.5090 • IO 2 ± 6.1 • lO 1 

402 ± 4 

6.5989 • IO 2 ± 9.8 • 10 1 

MovieLens-lOM 


Proposed 


geomCG 

r 

Time 

MSE on T 

Time 

MSE on r 

(4,4,4) 

1748 ± 441 

0.6762 ± 1.5 10~ a 

2981 ± 40 

0.6956±?2.8 ■ 10 -3 

(6,6,6) 

6058 ± 47 

0.6913 ± 3.3 10 3 

6554 ± 655 

0.7398 ::77.1 • 10 4 

(8,8,8) 

11370 ±103 

0.7589 ± 7.1 10 3 

13853 ± 118 

0.8955±?3.3 ■ 10 3 

(10,10, 10) 

32802 ± 52 

1.0107 ± 2.7 • 10 

38145 ± 36 

1.6550 : .'8.7 • 10 2 


6 Conclusion and future work 

We have proposed a preconditioned nonlinear conjugate gradient algorithm for the tensor completion 
problem. The algorithm stems from the Riemannian preconditioning approach that exploits the 


3 http://grouplens.org/datasets/movielens/ 
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fundamental structures of symmetry, due to non-uniqueness of Tucker decomposition, and least- 
squares of the cost function. A novel Riemannian metric (inner product) is proposed that enables 
to use the versatile Riemannian optimization framework. Concrete matrix expressions are worked 
out. Numerical comparisons suggest that our proposed algorithm has a superior performance on 
different benchmarks. As future research directions, we intend to look into ways of updating ranks 
in tensors as well as look into the issue of preconditioning on other tensor decomposition models, 
e.g., hierarchical Tucker decomposition fl24l and tensor networks f25l . 
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Riemannian preconditioning for tensor completion 
supplementary material 


A Derivation of manifold-related ingredients 

The concrete computations of the optimization-related ingredientspresented in the paper are dis¬ 
cussed below. 

The total space is Ad := St(ri,ni) x St(r 2 ,n 2 ) x St ( 73 , 713 ) x K r i xr 2 xr 3_ Each element x G Ad 
has the matrix representation (Ui, U 2 , U 3 , Q). Invariance of Tucker decomposition under the trans¬ 
formation (U|,U 2 ,U 3 .C/) 1-4 (UiOi.UaOa.UaOa.gxiOfxaO^’xaOi’) for all O d G 0(r d ), 
the set of orthogonal matrices of size of r d x r d results in equivalence classes of the form 
N = [(Ui.Ua.Ua.S)] := {(U 1 0 1 ,U 2 0 2 ,U 3 0 3 ,gx 1 0f x 2 0^x 3 0j) : O d G 0(r d )}. 


A.l Tangent space characterization and the Riemannian metric 

The tangent space, T x Ad, at x given by (Ui, U 2 ,U 3 , Q) in the total space Ad is the product space of 
the tangent spaces of the individual manifolds. From in Example 3.5.2], the tangent space has the 
matrix characterization 

T X M = {(Zu^Zu^Z^.Zg) € r iXr > x i" 2Xr2 x l n ’ xr ’ x ri xr!Xr 3 

: UjZ U(i + Zu d Ud = 0, for d G {1,2,3}}. ( 

The proposed metric g x : T x Ad x T x Ad —> K. is 

9x{^x,Vx) = (£ui i » 7 Ui (GiGf )) + (£u 2 ) frtJ 2 (G 2 G^)) + (£u 3 A7u 3 ( G 3 Gj)) + {£g,Vg), (A.2) 

where t; x ,r] x G T x Ad are tangent vectors with matrix characterizations (£ 1.11 ■ £u 2 ■ AiJ:>.• ) and 

(77Ui,?7u 2 , 77jj 3 , i ] g ), respectively and (•, •) is the Euclidean inner product. 


A.2 Characterization of the normal space 

Given a vector in M raiXri x K " 2Xr 2 x R ” 3Xr3 x j ts projection onto the tangent space 

T x Ad is obtained by extracting the component normal , in the metric sense, to the tangent space. 
This section describes the characterization of the normal space, N X M.. 

Let Cx = (CunCuatCua.Ce) £ N x Ad, and r/ x = (pu 1 ,Vv 2 ,W 3 ,9g) € T X M. Since Q x is orthogonal 
to r) x , i.e., g x (Cx,Vx) = 0 , the conditions 

Trace(G rf GjCu rf r/uJ = 0, for d G {1,2,3} (A.3) 

must hold for all rj x in the tangent space. Additionally from CD Example 3.5.2], rju d has the 
characterization 

Vv d = U d n + U d± K, (A.4) 

where $7 is any skew-symmetric matrix, K is a any matrix of size ( n d — r d ) x r d , and Udj_ is any 
n d x (n d — r d ) that is orthogonal complement of U^. Let £u d = (u d G d Gj and let (u d is defined as 

Cu d =U d A + U d± B (A.5) 

without loss of generality, where A G R rdXrd and B G M( na_T '<i) xr '<i are to be characterized from 
( |A.3[ > and ( |A . 4 [ > . A few standard computations show that A has to be symmetric and B = 0. Conse¬ 
quently, Cu d = U d S Ud , where S Ud = S/, ,. Equivalently, ( Ud = U,/Su d (G,/Gj)“ 1 for a symmetric 
matrix Su d - Finally, the normal space N x Ad has the characterization 

N X M = {(U 1 S Ul (G 1 Gf)- 1 ,U 2 S U2 (G2G^)- 1 ,U3S U3 (G 3 G^)- 1 ,0) : 

Su d G R rdXrd ,S^ d = Su d , for d G {1,2,3}}. ^ 
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A.3 Characterization of the vertical space 

The horizontal space projector of a tangent vector is obtained by removing the component along the 
vertical direction. This section shows the matrix characterization of the vertical space V x . 

V x is the defined as the linearization of the equivalence class [(U i. U- 2 - U 3 , Q)\ 
at x = [(Ui, U 2 , U 3 , <?)]. Equivalently, V x is the linearization of 

(U 1 O 1 , U 2 0 2 , U 3 O 3 , 0 xiO^x 2 O^ X 3 Oj) along Od £ Cl(r d ) at the identity element for 
d £ {1,2,3}. From the characterization of linearization of an orthogonal matrix UTl Exam¬ 
ple 3.5.3], we have the characterization for the vertical space as 

14 = {(Uif2i,U 2 f2 2 ,U 3 f2 3 , —+ t?x 2 f2 2 + (5x 3 J1 3 )) : rA 7 t 

n d £ = -n d for de {1,2,3}}. 1 ’ 


A.4 Characterization of the horizontal space 


The characterization of the horizontal space 1~l x is derived from its orthogonal relationship with the 
vertical space V x . 


Let £ x = (£ui, £u 2 , £u 3 , £g) £ T~L X , and ( x 
onal to ( x , which is equivalent to g x (J; x , ( x ) 

CO} and CO} . 


(Cu 1 1 Cu? , Cu 3 , Ce) G V x . Since £ x must be orthog- 
0 in (A. 21 , the characterization for is derived from 


9x (; Cc) 


(£Di,Cui(GiGf)) + (£u 2 ,Cu 2 (G 2 G;f)) + (Cu 3 ,Cu 3 (G 3 G 3 )) + (£g,Ce> 

(£u,t/Ui(GiGf)) + (^u 2 ,tyu 2 (G 2 G^)) + (£u 3 ,77u 3 (G 3 G3 )) 

+(£g, — (£? x iJli + Qx 2 fi 2 + Q x 3 G 3 )) 


(^Ui) ? 7Ui(GiGf)) + (^ U2 , ?tu 2 (G 2 G^)) + (£u 3 ,t?u 3 (G 3 G3 )) 

+ (Ce, —t/Xifii) + (£g, -Qx 2 Gl 2 ) + (£g, ^t?X 3 f2 3 ) 

(We switch to unfoldings of Q.) 


Trace((G 1 Gf)^ 1 (U 1 f2 1 ))+Trace((G 2 Gj)^ 2 (U 2 ft 2 )) 

+Trace((G 3 Gj)^ 3 (U 3 D 3 )) 

+Trace(£ G i(~tCGi) T ) + Trace(£ G 2 (-~ri 2 G 2 ) T ) + Trace(^ G 3 (-fl 3 G 3 ) T ) 


Trace 


GtG^Ur+^Gfln 


+Trace 
+Trace 


{(G 2 GM a U 2 + & 2 G^}n 2 

{(G 3 Gg )^u 3 U 3 + £g 3 GJ} fi 3 


where } GiI is the mode-d unfolding of £g. Since g x (£ x ,(x) above should be zero for all skew- 
matrices £ x = ((ui, Cu 2 , £u 3 , ) £ 'Hx must satisfy 

(G d G d )^U d + £,c d G d is symmetric for d £ {1,2,3}. (A. 8 ) 


A.5 Derivation of the tangent space projector 

The tangent space T X M. projector is obtained by extracting the component normal to T x Ai in 
the ambient space. The normal space N X A4 has the matrix characterization shown in ( |A. 6 } . 
The operator T* : K raiXri x R n ^xr 2 x R n 3 xr 3 x R nxr 2 xr 3 T ^ M . (y Gi , Y U2 , Y U3 , Yg) 
ha ^(Yjjj, Yu 2 , Yu 3 , Yg) has the expression 

’MY Ul ,Y U 2 ,Y U 3 ,Y g ) = (Y Ul -U 1 S Ul (G 1 GD- 1 ,Y U 2 -U 2 S U 2 (G 2 GT)- 1 , 

Yu 3 —U 3 Su 3 (G 3 GJ) _1 , Yg). 
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From the definition of the tangent space in ( |A. 1| >, U,/ should satisfy 

VTj d V d +UZvu d = (Y^-U.Su^GrfGjj-^^ + uKY^-UdSuJG.Gl)- 1 ) 

= Y^U d - (GrfG^-^UjUd + UjYu, -UjUdSu^G ^)" 1 
= Y£ d U d - (G d Gj)- l S Ud + UjY Ud - Su^GrfGj)- 1 = 0. 

Multiplying (G^GJ) from the right and left sides results in 

(GdGj) _1 S Ud +S Ud (G d Gj) -1 = Yj d Ud + UlY Ud 

S Ud GdGj + GdGjS Ud = GdG^Y^Ud + UjYuJGdGj. 

Finally, we obtain the Lyapunov equation as 

S Ud GdGj + G (i GjS Ud = GdGj(Y5 d Ud + UjY Ud )G £i Gj for d £ {1,2,3}, (A.10) 

that are solved efficiently with the Matlab’s lyap routine. 


A.6 Derivation of the horizontal space projector 

We consider the projection of a tangent vector rj x = (t?Ui, Pu 2 j ? 7u 3 , Vg ) £ T X M into a vector 
= (■fu,, £u 2 1 £u 3 , ) £ H x . This is achieved by subtracting the component in the vertical space 

V x in ( |A.7| > as 

PUi = Ply — UiDi +UiD^, 

=Sd 1 6Vx 

* Vv 2 = Vv 2 ~ U 2 D 2 + U 2 D 2 , 

t?U 3 = '7U 3 _ U 3 G 3 + U 3 SI 3 . 

, Vg = VG ~ (—(0XiDi + Gx 2 fl 2 + £ 7 X 3113 )) + (—(^XiDi + Gx 2 fl 2 + ^X 3 ^ 3 ))- 


As a result, the horizontal operator II X : T X M. —> 7 ~L X : r] x 1 —>• 11^; ( 77 ^) has the expression 

RxiVx) = (t?Ui-UiDi,t?u 2 -U 2 D2,r7u 3 -U3D3, 

Vg — (— x 1^1 + Gx 2 fl 2 + G x 3 fl 3 ))), 


(A.l 1) 


where r] x = (vvuVv 2 ,Vv siVg) G T x Ad and fid is a skew-symmetric matrix of siz e ra x r^. The 
skew-matrices fid for d = {1,2, 3} that are identified based on the conditions (A. 81 . 


It should be noted that the tensor Qx if2i + £7 x 2 D 2 4 G x 3 ffi i n (A.7 1 has the following equivalent 
unfoldings. 


Gx 1 fl 1 +Gx 2 fl 2 + gx 3 fl3 0 1 Gi + Gi(I r3 ®D 2 ) T + Gi(D 3 8)Ir 2 ) T 

g 2 (i, 3 ® fli) T + fi 2 G 2 + G 2 (n 3 (g)I n ) T 
-£^4- G 3 (I r2 ® fli) T + G 3 (D 2 ® I ri ) T + n 3 G 3 . 


Plugg ing £ Wl = r/u, - UiDi and £ Gl = 77G1 + fiiGi + Gi(I r3 (8 fl 2 ) T + Gi(D 3 ® I r2 ) T into 
(A. 81 and using the relation (A ® B) T = A T ® B 7 results in 

(GiGfj^U+^Gf 

= (G 1 Gf)(7 7Ul -UrDr^Ur 

+ {pci + (D 1 G 1 + Gi(I r3 ® fl 2 ) T + Gi(H 3 ® I^) 71 )} Gf 
= (G 1 GD^ 1 U 1 -(G 1 Gf)(U 1 D 1 ) T U 1 ‘ (A.12) 

+77 Gl Gf + DiGiG^ + Gi(I r3 ® D 2 ) T G^ + Gi(D 3 ® I r2 ) T Gf 
= (GrGf^Ur + (GiG^flr 

+t? Gi G^ + DiGiG^ - G!(I r3 ® D 2 )G^ - Gi(n 3 ® Ir 2 )G[, 


which should be 

iT 


a symmetric matrix due to (A. 81 , 

((GrGD&U+foG^. 


i.e., (G 1 Gf)^U + ^ Gl Gf = 


13 















Subsequently, 

(GxG^Ui + (GiG^Gi + ?7 Gl G^ + O^G^ - G^ 0 G 2 )G^ - G^Og 0 I r JGf 
= Uf»7D 1 (G 1 G?’)-0 1 GiG?’ + G 1 »?g 1 -G 1 G?’0 1 +G 1 (I r 3®0 2 )G?’ + Gi(G 3 0 I^Gf, 
which is equivalent to 


GiGfOx + OxGiG^ - Gi(I ra 0 G 2 )Gf - Gi(G 3 0 I r JG[ 

= Skew(Ux ttujGiG^) + Skew(Gii?Q 1 ). 

Here Skew(-) extracts the skew-symmetric part of a square matrix, i.e., Skew(D) = (D — D r )/2. 
Finally, we obtain the coupled Lyapunov equations 

' GiGfOx + OxGiGf - Gi(I r3 0 G 2 )Gf - Gi(G 3 0 I r2 )Gf 

= Skew(Ux ryUiGiGf) + Skew(Giry Gi ), 
G 2 G^G 2 + G 2 G 2 G? - G 2 (I r3 0 GOG? 1 - G 2 (G 3 0l ri )G£ 

= Skew(U?’r 7 u 2 G 2 G^) + Skew(G 2 ^ 2 ), 
G 3 G^G 3 + G 3 G 3 Gj - G 3 (I r2 0 Sh)Gl - G 3 (G 2 0 I n )G 3 

= Skew(U J ryu 3 G 3 G J) + Skew(G 3 7?£ 3 ), 


(A. 13) 

that are solved efficiently with the Matlab’s peg routine that is combined with a specific precondi¬ 
tioner resulting from the Gauss-Seidel approximation of (A. 13 i. 


A.7 Derivation of the Riemannian gradient formula 


Let/(AT) = ||Pn(A')-Po(A*)||y|G|and5 = 2(P n (ax 1 U 1 x 2 U 2 x 3 U 3 )-Pn(A'*))/|0|be 
an auxiliary sparse tensor variable that is interpreted as the Euclidean gradient of / in K" 1 X " 2X " :! . 

The partial derivatives of /(Ui, U 2 , U 3 , Q) are 


d/t(Ui,U 2 ,U 3 ,Gi) 

5Ui 

9/ 2 (U!,U 2 ,U 3 ,G 2 ) 

au 2 

< 9/ 3 (Ui,U 2 ,U 3 ,G 3 ) 

5U 3 

df( Ur.Ua.Ua.S) 

dg 


j^|('Pa(UiG 1 (U 3 0 U 2 ) t ) - Pn(X*))(U 3 0 U 2 )Gf 
Si(U 3 0U 2 )Gf, 

p(Pa(U 2 G 2 (U 3 0 Ux) T ) - Po(X*))(U 3 0 Ux)G^ 
S 2 (U 2 0U!)G^ 

p(Pa(U 3 G 3 (U 2 0 Ux) T ) - Po(X*))(U 2 0 
S 3 (U 2 0Ul)Gg , 

p(Pn(a x iUiX 2 U 2 x 3 U 3 ) - Vn{X*)) 

Xl Uf x 2 Uj x 3 Uj 

S XxUf x 2 U 2 T x 3 U 3 , 


where X) is mode-c/ unfolding of AT* and 


Si 




5 2 

5 3 


s 




('Pq(U 1 G 1 (U 3 0 U 2 ) r ) - "Po(Xi)) 
(Pn(U 2 G 2 (U 3 0U 1 ) T )-P n (X*)) 
(Po(U 3 G 3 (U 2 0 Ui) T ) - Po(X^)) 
(7 > n(£x 1 U 1 x 2 U 2 x 3 U 3 ) — Vn{X*)) 


Due to the specific scaled metric (A.2i, the partial derivatives of / are further scaled by 
((GiGf ) _1 , (G 2 G^) _1 , (G 3 G 3 ) _1 ,X), denoted as egrad x f (after scaling), i.e.. 
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egradj = (Sr(U 3 0 U 2 )Gf(GrGf )~\ S 2 (U 3 0 U 1 )G^(G 2 G^)- 1 , 
S 3 (U 2 ®Ui)Gj(G 3 Gj)- 1 : 5x 1 uf x 2 I# x 3 U^). 


Consequently, from the relationship that horizontal lift of grad^j / is equal to grad.,,/ = 
dr (egradj/), we obtain that, using (A.9 1 , 


the horizontal lift of grad [x] / = *(Si(U 3 0 U 2 )Gf (GiG^) -1 , S 2 (U 3 0 Ui)G^(G 2 G£) _1 , 

S 3 (U 2 0Ui)G^(G 3 Gj)- 1 ,5x 1 uf x 2 U^ x 3 U^) 

= (Sr(U 3 0 U 2 )Gf (G 1 G?’)- 1 - UrBu, (GiGf) -1 , 

S 2 (U 3 0 U 1 )G^(G 2 G^)- 1 - u 2 b U2 (g 2 g^)- 1 , 

S 3 (U 2 0 1/ )Gl(G :i Gl)-' - U 3 B U3 (GgG^)- 1 , 

5xiU[ x 2 l# x 3 U 3 ). 


From the requirements in (A.lOi for a vector to be in the tangent space, we have the following 
relationship for mode-1. 


Bu.GrGf 


GiGfB^ 


= GiG^Y^Ui+UfYuJGiGf, 


where Yu, = (Si(U 3 0 U 2 )Gf (GiG^)" 1 . 
Subsequently, 


GiG^Yj^Ux+UfYuJGiG? 


GxG^ {((S 1 (U 3 0U 2 )Gf(G 1 Gf)- 1 ) T U 1 
+Uf (Sr(U 3 0 U 2 )Gf (GiG^)- 1 } GiGf 

((SUUg 0 U 2 )Gf J^iGiG? 1 + G 1 G[U t (S 1 (U 3 0 U 2 )Gf 
(GiGfuf (S!(U 3 0 U 2 )Gf) T + GiGfuf (S^U, 0U 2 )Gf 
2Sym(GiGfuf (Si(U 3 0 U 2 )Gf). 


Finally, Bu d for d £ {1,2,3} are obtained by solving the Lyapunov equations 

f B Ul G 1 G?’+G 1 G?’B Ul = 2Sym(G 1 Gfuf (Si(U 3 0 U 2 )Gf), 

< Bu,G 2 G^ + G 2 G^B U2 = 2Sym(G 2 GM(S 2 (U 3 0U 1 )Gj), 

{ B U3 G 3 Gj+G 3 GjB U3 = 2Sym(G 3 Gjui’(S 3 (U 2 0U 1 )G^). 

where Sym(-) extracts the symmetric part of a square matrix, i.e., Sym(D) = (D + D J )/2. The 
above Lyapunov equations are solved efficiently with the Matlab’s lyap routine. 


B Additional numerical comparisons 

In addition to the representative numerical comparisons in the paper, we show additional numerical 
experiments spanning synthetic and real-world datasets. 

Experiments on synthetic datasets: 

Case S2: small-scale instances. We consider tensors of size 100 x 100 x 100, 150 x 150 x 150, 
and 200 x 200 x 200 and ranks (5,5, 5), (10,10,10), and (15,15,15). OS is {10, 20, 30}. Figures 
|A. 1 [ a)-(c) show the convergence behavior of different algorithms, where (b) is identical to the figure 
in the manuscript paper. Figures |AT} d)-(f) show the mean square error on F on each algorithm. 
Furthermore, Figu re |A.l[ g)-(i) show the mean square error on F when OS is 10 in all the five 
runs. From Figures pV. 1 [ our proposed algorithm is consistently competitive or faster than geomCG, 
HalRTC, and TOpt. In addition, the mean square error on a test set T is consistently competitive or 
lower than that of geomCG and HalRTC, especially for lower sampling ratios, e.g, for OS 10. 

Case S3: large-scale instances. We consider large-scale tensors of size 3000 x 3000 x 3000, 
5000 x 5000 x 5000, and 10000 x 10000 x 10000 and ranks r=(5, 5, 5) and (10,10,10). OS is 10. 
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We compare our proposed algorithm to geomCG. Figure [A72| shows the convergence behavior of the 
algorithms. The proposed algorithm outperforms geomCG in all the cases. 

Case S4: influence of low sampling. We look into problem instances which result from scarcely 
sampled data. Th e tes t requires completing a tensor of size 10000 x 10000 x 10000 and rank 
r= (5,5,5). Figure pO] shows the convergence behavior when OS is {8, 6, 5}. The case of OS = 5 
is particularly interesting. In this case, while the mean square error on T increases for geomCG, the 
proposed algorithm stably decreases the error in all the five runs. 

Case S7: asymmetric instances. We consider instances where dimensions and ranks along certain 
modes are different than others. Two cases are considered. Case (7.a) considers tensors size 20000 x 
7000 x 7000, 30000 x 6000 x 6000, and 40000 x 5000 x 5000 and rank r = (5,5,5). Case (7.b) 
considers a tensor of size 10000 x 10000 x 10000 with ranks r = (7, 6, 6), (10,5,5), and (15,4,4). 
Figures pO| a)-(c) show that the convergence behavior of our proposed algorithm is superior to that 
of geomCG. Our proposed algorithm also outperforms geomCG for the asymmetric rank cases as 
shown in Figure |A~4} d)-(f). 

Case S8: medium-scale instances. We additionally consider medium-scale tensors of size 500 x 
500 x 500,1000 x 1000 x 1000, and 1500 x 1500 x 1500 and ranks r = (5,5,5), (10,10,10), and 
(15,15,15). OS is {10, 20, 30,40}. Our proposed algorithm and geomCG are only compared as 
the other algorith ms ca nnot handle these scales efficiently. Figures [A~5} a)-(c) show the convergence 
behavior. Figures |A3} d)-(f) also show the mean square error on F of rank r = (15,15,15) in all the 
five runs. The proposed algorithm performs better than geomCG in all the cases. 

Experiments on real-world datasets: 


Case Rl: hyperspectral image. We also show the performance of our algorithm on the hype rspec- 
tral image “Ribeira”. We show the mean square error on F when OS is {11, 22} in Figure A.6 where 
(a) is identical to the figure in the manuscript paper. Our proposed algorithm gives lower test errors 


than those obtained by the other algorithms. We also show the image recovery results. Figures A.7 


and A.8 show the reconstructed images when OS is {11, 22}, respectively. From these figures, we 
find that the proposed algorithm shows a good performance, especially for the lower sampling ratio. 


Case R2: MovieLens-lOM. Figure [A~9| shows the convergence plots for all the five runs of ranks 
r = (4,4,4), (6,6,6), (8,8,8) and (10,10,10). These figures show the superior performance of 
our proposed algorithm. 
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Figure A.l: Case S2: small-scale comparisons. 
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Figure A.2: Case S3: large-scale comparisons. 


17 
















































































































































































































































































































































y,» __ ■ , ; _ « < »i _ _ _ , \ *,>\ _ yeumoo 

0 100 200 300 400 0 100 200 300 0 50 100 150 200 250 

Time in seconds Time in seconds Time in seconds 

(a) OS = 8. (b) OS = 6. (c) OS = 5. 


Figure A.3: Case S4: low-sampling comparisons. 
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Figure A.4: Case S7: asymmetric comparisons. 
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Figure A.5: Case S8: medium-scale comparisons. 
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(a) OS = 11. 


(b) OS = 22. 


Figure A.6: Case Rl: mean square error on I ’. 
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Figure A.7: Case Rl: recovery results on the hyperspectral image “Ribeira” (frame = 16, OS = 11). 



(e) HaLRTC. (f) TOpt. (g) Latent. (h) Hard. 

Figure A.8: Case Rl: recovery results on the hyperspectral image “Ribeira” (frame = 16, OS = 22). 
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Figure A.9: Case R2: mean square error on I’. 
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